poisoning rate
_NeurIPS2023_CR__Certified_Backdoor_Detection.pdf
The main purpose of this research is to provide the user of DNN classifiers with a method to detect if the model is backdoor attacked without access to the training set. All attacks used to evaluate our detection method in this paper are created by published backdoor attack strategies on public datasets. Thus, we did not create new threats to society. Moreover, our work provides a new perspective on backdoor defense, as it is the first to address the certification of backdoor detection. It helps other researchers to understand the behavior of deep learning systems facing malicious activities. While existing backdoor detectors are all empirical [67, 20, 75, 41, 69, 6, 56, 13], our work initiates a new research direction - backdoor detection with certification. Moreover, we first exposed that certified backdoor detectors and certified robustness against backdoor attacks complement each other [86, 71, 27, 53].
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.
Checklist 1. For all authors (a)
Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code will Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We trained backdoored model for 100 epochs using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1 on CIFAR-10 and the ImageNet subset (0.01 on GTSRB), a weight decay of The learning rate was divided by 10 at the 20th and the 70th epochs. The details of backdoor triggers are summarized in Table 5. ASR: attack success rate; CA: clean accuracy.
_NeurIPS2023_CR__Certified_Backdoor_Detection.pdf
Thus, we did not create new threats to society. Moreover, our work provides a new perspective on backdoor defense, as it is the first to address the certification of backdoor detection. This assumption holds in general in practice. In our setting, this is reflected by a small samplewise local probability for the labeled class for most samples used for computing LDP, which may easily lead to a large LDP . In the following, we show that a larger deviation of the learned decision boundary of a binary Bayesian classifier will affect its LDP .